Goto

Collaborating Authors

 positive definite kernel


Kriging via variably scaled kernels

Audone, Gianluca, Marchetti, Francesco, Perracchione, Emma, Rossini, Milvia

arXiv.org Machine Learning

Classical Gaussian processes and Kriging models are commonly based on stationary kernels, whereby correlations between observations depend exclusively on the relative distance between scattered data. While this assumption ensures analytical tractability, it limits the ability of Gaussian processes to represent heterogeneous correlation structures. In this work, we investigate variably scaled kernels as an effective tool for constructing non-stationary Gaussian processes by explicitly modifying the correlation structure of the data. Through a scaling function, variably scaled kernels alter the correlations between data and enable the modeling of targets exhibiting abrupt changes or discontinuities. We analyse the resulting predictive uncertainty via the variably scaled kernel power function and clarify the relationship between variably scaled kernels-based constructions and classical non-stationary kernels. Numerical experiments demonstrate that variably scaled kernels-based Gaussian processes yield improved reconstruction accuracy and provide uncertainty estimates that reflect the underlying structure of the data


Persistence Fisher Kernel: A Riemannian Manifold Kernel for Persistence Diagrams

Neural Information Processing Systems

Algebraic topology methods have recently played an important role for statistical analysis with complicated geometric structured data such as shapes, linked twist maps, and material data. Among them, \textit{persistent homology} is a well-known tool to extract robust topological features, and outputs as \textit{persistence diagrams} (PDs). However, PDs are point multi-sets which can not be used in machine learning algorithms for vector data. To deal with it, an emerged approach is to use kernel methods, and an appropriate geometry for PDs is an important factor to measure the similarity of PDs. A popular geometry for PDs is the \textit{Wasserstein metric}.




Transformation

Neural Information Processing Systems

Particularly important is the ability to incorporate domain knowledge of invariances, e.g., translational invariance ofimages. Kernels based onthemaximumsimilarity overagroup of transformations are not generally positive definite. Perhaps it is for this reason that they have not been studied theoretically. We address this lacuna and show thatpositivedefiniteness indeed holdswith high probabilityforkernels based on the maximum similarity in the small training sample set regime of interest, and that they do yield the best results in that regime.



Metric Transforms and Low Rank Representations of Kernels for Fast Attention

Neural Information Processing Systems

We introduce a new linear-algebraic tool based on group representation theory, and use it to address three key problems in machine learning.1. Past researchers have proposed fast attention algorithms for LLMs by approximating or replace softmax attention with other functions, such as low-degree polynomials. The key property of these functions is that, when applied entry-wise to the matrix $QK^{\top}$, the result is a low rank matrix when $Q$ and $K$ are $n \times d$ matrices and $n \gg d$. This suggests a natural question: what are all functions $f$ with this property? If other $f$ exist and are quickly computable, they can be used in place of softmax for fast subquadratic attention algorithms.


Persistence Fisher Kernel: A Riemannian Manifold Kernel for Persistence Diagrams

Neural Information Processing Systems

Algebraic topology methods have recently played an important role for statistical analysis with complicated geometric structured data such as shapes, linked twist maps, and material data. Among them, \textit{persistent homology} is a well-known tool to extract robust topological features, and outputs as \textit{persistence diagrams} (PDs). However, PDs are point multi-sets which can not be used in machine learning algorithms for vector data. To deal with it, an emerged approach is to use kernel methods, and an appropriate geometry for PDs is an important factor to measure the similarity of PDs. A popular geometry for PDs is the \textit{Wasserstein metric}.



Heterogeneous Attributed Graph Learning via Neighborhood-Aware Star Kernels

Huang, Hong, Yao, Chengyu, Chen, Haiming, Gao, Hang

arXiv.org Artificial Intelligence

Attributed graphs, typically characterized by irregular topologies and a mix of numerical and categorical attributes, are ubiquitous in diverse domains such as social networks, bioinformatics, and cheminformatics. While graph kernels provide a principled framework for measuring graph similarity, existing kernel methods often struggle to simultaneously capture heterogeneous attribute semantics and neighborhood information in attributed graphs. In this work, we propose the Neighborhood-Aware Star Kernel (NASK), a novel graph kernel designed for attributed graph learning. NASK leverages an exponential transformation of the Gower similarity coefficient to jointly model numerical and categorical features efficiently, and employs star substructures enhanced by Weisfeiler-Lehman iterations to integrate multi-scale neighborhood structural information. We theoretically prove that NASK is positive definite, ensuring compatibility with kernel-based learning frameworks such as SVMs. Extensive experiments are conducted on eleven attributed and four large-scale real-world graph benchmarks. The results demonstrate that NASK consistently achieves superior performance over sixteen state-of-the-art baselines, including nine graph kernels and seven Graph Neural Networks.